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Abstract:  Intelligence  is  a  capability  ascribed  typically  to  animals,  but  not  usually  to  plants. 
Animals  can  move  while  plants  do  not.  Is  the  mobility  a  necessary  condition  or  driving  force  for  the 
emergence  of  intelligence?  We  hypothesize  that  mobility  plays  a  foundational  role  in  evolving  animal 
and  human  intelligence,  thus,  is  fundamentally  important  in  understanding  and  creating  embodied 
cognitive  systems  [1].  In  this  project,  we  aim  to  develop  a  new  class  of  machine  learning  algorithms 
for  mobile  cognitive  systems  that  actively  collect  data  by  sensing  and  interacting  with  the 
environment.  We  envision  a  new  paradigm  of  autonomous  AI  that  overcomes  the  previous  AI 
paradigms  of  top-down/rule-driven  symbolic  and  bottom-up/data-driven  statistical  systems.  Inspired 
by  the  dual  process  theory  of  mind  [2].  We  use  mobile  robot  platforms  to  investigate  the  autonomous 
learning  algorithms  and  demonstrate  their  capability  in  real-world  home  environments. 

Introduction:  In  the  history  of  artificial  intelligence  (AI),  two  main  approaches  have  emerged: 
symbolic  and  statistical  systems.  The  former  approach,  or  first  generation  AI,  is  deductive,  relies  on 
rule-based  programming,  and  can  solve  complex  problems,  however,  faces  difficulties  in  learning  and 
adaptability.  The  latter  approach,  or  second  generation  AI,  is  inductive,  relies  on  statistical  learning 
from  big  data,  but  cannot  solve  complex  problems,  the  speed  of  learning  is  limited,  and  thus  faces  the 
issues  of  scalability.  To  create  human-level  artificial  intelligence,  we  need  a  methodology  that 
combines  the  best  of  both  approaches  and  also  scales  up  to  real  complex  problems. 

Recent  advancements  in  deep  learning  provide  a  crucial  lesson  in  this  direction,  i.e.,  building  more 
expressive  representations  help  solve  complex  problems  [3]  [4].  This  provides  evidence  for  an  earlier 
prediction,  that  “learning  requires  much  more  memory  than  we  have  thought  to  solve  real-world 
problems”  [5].  Deep  learning  models  use  much  larger  memory  than  previous  machine  learning  models, 
but  they  do  not  overfit  due  to  the  increased  data  size.  However,  deep  learning  models  are  very  limited 
in  their  learning  speed,  flexibility,  and  robustness  when  applied  to  dynamic  environments  of  mobile 
cognitive  agents. 

Why  and  how  has  the  human  brain  evolved  to  learn  so  rapidly,  flexibly,  and  robustly?  We 
hypothesize  that  the  brain  evolved  these  properties  mainly  to  support  its  mobility  for  the  survival  of  its 
body  in  hostile  environments  [1] [6] .  In  fact,  the  brain’s  main  function  is  to  make  decisions  and  control 
the  body  motion.  Higher  functions  like  memory  and  planning  were  evolved  on  top  of  this  substrate. 
Therefore,  to  achieve  a  truly  human-level  AI,  it  is  important  to  study  higher-level  intelligence,  such  as 
vision  and  language,  in  a  mobile  platform  and  dynamic  environment.  It  is  our  belief  that  fast,  flexible, 
and  robust  learning  in  interactive  mobile  environments  will  give  rise  to  a  new  paradigm  of  machine 
learning  that  will  enable  the  next  generation  of  autonomous  AI  systems. 

In  this  project,  the  ultimate  goal  is  to  demonstrate  a  mobile  personal  robot  that  learns  the  objects, 
people,  actions,  events,  episodes  and  schedule  plans  from  daily  to  extended  periods  of  time.  In  the 
basic  year  of  the  project,  we  built  a  multi-module  integrated  system  for  mobile  robots  to  perceive 
information  (objects,  people,  actions)  from  the  environment,  act  (schedule,  interact)  according  to  the 
perceived  information  and  develop  models  that  learn  the  dynamics  of  the  environment.  We  also 
demonstrated  the  integration  of  multimodal  information  for  an  interactive  system  which  efficiently 
infers  and  responds  to  the  goals  and  plans  of  the  observed  environment. 
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Experiments  and  Results: 

a)  Perception-Action-Learning  System  for  Mobile  Social-Service  Robots 

Making  robots  becoming  more  human-like,  capable  of  providing  natural  social  services  to 
the  customers  in  dynamic  environments  such  as  houses,  restaurants,  hotels  and  even  airports 
has  been  a  challenging  goal  for  researchers  in  the  field  of  social-service  robotics.  One 
promising  approach  is  developing  an  integrated  system  of  methodologies  from  many  different 
research  areas.  This  multi-module  integrated  intelligent  robotic  system  has  been  widely 
accepted  and  its  performance  has  been  well  known  from  previous  studies  [7]  [8].  However,  with 
the  individual  roles  of  each  module  in  the  integrated  system,  perception  modules  mostly 
suffered  from  desynchronization  between  each  other  and  difficulty  in  adapting  to  dynamic 
environments  [9] .  This  occurred  because  of  the  different  process  time  and  scale  of  coverage  of 
the  adopted  vision  techniques  [10].  To  overcome  such  difficulties,  developers  usually  upgraded 
or  added  expensive  sensors  (hardware)  to  the  robot  to  improve  performances.  Though  this  may 
have  provided  some  solutions  to  the  limitations,  current  robot  systems  still  have  difficulties  on 
natural  interaction  within  real-life,  dynamic  environment. 

We  account  this  matter  by  designing  a  system  incorporated  with  state-of-the-art  deep 
learning  methods  and  inspiration  by  the  cognitive  perception-action-learning  cycle  [11].  The 
implemented  novel  and  robust  integrated  system  for  mobile  social-service  robots  that  at  least 
includes  an  RGB-D  camera  and  any  obstacle  detecting  sensors  (laser,  bumper,  sonar),  achieved 
real-time  performance  on  various  social  service  tasks.  Also,  by  performing  the  task  in  real-time 
with  robustness,  more  natural  interaction  with  people  could  be  attained. 


Environment 


Perception 


Learning 

Tilllllif 


I  iUk 

Guiding 

JP 1 


Approach  to  Target  with  Best  Position  Selection 


Figure  1.  Perception- Action-Learning  system  for  mobile  social- service  robots 
using  deep  learning 


As  illustrated  in  Figure  1,  our  system's  perception -action-learning  cycle  works  in  real-time 
(~0.2  s/cycle)  where  the  arrows  indicate  the  flow  of  each  module.  The  system  was  implemented 
on  a  server  of  17  CPU,  32  GB  RAM  and  GTX  Titan  12  GB  GPU.  Using  ROS  topics,  the 
communication  between  the  server  and  the  robot  were  achieved  and  the  ROS  topics  were 
passed  through  5  GHz  Wi-Fi  connection. 

The  conducted  experiments  were  finely  designed  by  the  RoboCup@Home  Committee,  which 
is  described  in  the  rulebook  [12]  and  our  system  was  able  to  perform  all  the  scenarios  in  a 
significantly  improved  way. 


RoboCup2017@Home  Social  Standard  Platform  League  (SSPL)  Winning  First  Place 

We  used  our  system  on  SoftBank  Pepper,  a  standardized  mobile  social -service  robot,  and 
achieved  the  highest  score  in  every  scenario  performed  at  the  RoboCup2017@Home  Social 
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Standard  Platform  League  (SSPL),  winning  first  place  overall. 

Our  system  allows  robots  to  perform  social  service  tasks  in  real-life  social  situations  with 
high  performance  working  in  real-time.  However,  our  system  is  yet  to  fulfill  every  individual's 
expectations  on  performance  and  processing  speed,  we  highlight  the  importance  of  research  on 
not  only  the  individual  elements  but  the  integration  of  each  module  for  developing  a  more 
human-like,  idealistic  robot  to  assist  humans  in  the  future.  Related  videos  can  be  found  at 
https://goo.gl/Pxnfln  and  our  open-sourced  codes  at  https://github.com/soseazi/pal  pepper. 


[Table  1]  RoboCup2017@Home  Social  Standard  Platform  League  (SSPL)  Test  1  Result 


Team 

Poster 

Speech  & 

Person 

Cocktail 

Party 

Help  Me 

Carry 

GPSR 

Total 

Rank 

AUPAIR 

45.00 

117.5 

30 

10 

42.5 

245.00 

1 

UTS  Unleashed 

33.33 

85.5 

27.5 

0 

17.5 

163.83 

2 

SPQReL 

41.67 

32.5 

10 

5 

7.5 

96.67 

3 

KameRider 

31.67 

60 

0 

0 

0 

91.67 

4 

UChilePeppers 

31.67 

50 

0 

0 

0 

81.67 

5 

UvA@Home 

20.00 

47.5 

0 

0 

0 

67.50 

6 

ToB  I  @  Pepper 

41.25 

17.5 

7.5 

0 

0 

66.25 
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[Table  2]  RoboCup2017@Home  Social  Standard  Platform  League  (SSPL)  Test  2  Result 


Team 

Stage  1 

Open  Challenge 

Tour  Guide 

Restaurant 

EE-GPSR 

Total 

Rank 

AUPAIR 

245.00 

178.47 

95 

40 

70 

628.47 

1 

UTS  Unleashed 

163.83 

121.53 

0 

0 

0 

285.36 

2 

SPQReL 

96.67 

130.56 

0 

10 

20 

257.22 

3 

KameRider 

91.67 

136.81 

0 

15 

0 

243.47 

4 

b)  Integrated  Perception  Towards  Fully  Autonomous  General  Purpose  Service  Robots 

To  interact  with  or  assist  people,  service  robots  require  a  perception  framework  that  can 
provide  information  such  as  the  location/type  of  objects  and  the  identity/pose/gender  of  people 
in  the  environment.  Many  perception  frameworks  have  been  used  in  service  robots.  OpenCV  or 
OpenNI  have  been  widely  used  to  perform  perception  tasks  such  as  object  detection,  human 
pose  estimation.  These  frameworks  focus  on  only  a  few  tasks  such  as  object  detection  or  face 
recognition.  Furthermore,  these  frameworks  use  traditional  vision  methods  that  are  known  to  be 
vulnerable  to  illumination  change  or  translation  of  objects.  Those  frameworks  also  lack  a 
reasoning  engine  that  can  build  perceptual  information  and  reason  about  it.  Frameworks  such  as 
RoboSherlock  [13]  [14]  provide  sophisticated  reasoning  engines  on  top  of  the  integrated 
perception  pipeline  but  they  focus  only  on  object  manipulation  and  they  also  use  traditional 
vision  modules.  These  limitations  in  perception  frameworks  often  limit  service  robots  to  only 
show  good  performance  in  well-defined  tasks  in  a  controlled  environment. 

Recently,  following  the  remarkable  success  of  deep  learning  in  object  recognition  [15],  many 
deep  learning  based  perception  models  have  been  proposed.  Deep  learning  based  approaches 
are  known  to  be  robust  to  illumination  or  translation  and  have  marked  state-of-the-art 
performances  in  many  vision  tasks  such  as  object  detection  [16] [17] [18],  image  description 
[19]  [20],  and  pose  estimation  [21]  [22].  These  models  show  superior  performance  than  more 
traditional  approaches.  However,  these  models  are  not  enough  to  be  deployed  in  complex  and 
realistic  perception  tasks  since  they  mostly  focus  on  individual  tasks  such  as  object  detection, 
face  detection,  or  object  recognition.  Furthermore,  these  models  also  lack  reasoning  engines 
that  can  process  perceptual  information  efficiently. 

We  propose  IPSRO  (Integrated  Perception  for  Service  RObots)  framework,  which  is 
ROS-friendly  integrated  perception  system  that  we  have  recently  open-sourced.  IPSRO  can 
flexibly  integrate  several  perception  modules  including  deep  learning  models  to  extract  rich  and 
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useful  perceptual  information  from  the  environment  based  on  a  unified  perception 
representation.  On  top  of  that,  IPSRO  can  process  the  generated  perceptual  information  to 
perform  complex  perception  tasks.  We  conducted  experiments  using  GPSR  (General  Purpose 
Service  Robot)  task  of  RoboCup@Home.  In  the  GPSR  task,  the  robot  has  to  execute  arbitrary 
voice  commands.  The  commands  include  but  are  not  limited  to  finding  an  object  or  person, 
answering  a  question,  following  or  guiding  a  person,  counting  the  number  of  objects,  and 
describing  a  person  or  place. 

GPSR  (General  Purpose  Service  Robot)  Command  Executing  Experiments 


Figure  2.  Visualization  of  perceptual  information  of  IPSRO  framework. 


(Command  1,  2,  3  from  left  to  right) 

We  conducted  experiments  using  the  GPSR  task  in  a  lab  environment  and  RoboCup@Home 
social  standard  platform  league.  In  the  lab  environment,  we  gave  three  following  commands  to  the 
robot  with  IPSRO  framework. 

1 .  Find  James  in  the  kitchen  and  guide  him  to  entrance 

2.  Tell  me  how  many  coke  bottles  are  on  the  table 

3.  Describe  unknown  people  in  the  living  room 

As  seen  in  Figure  2,  our  framework  successfully  extracted  all  relevant  tags  from  the  camera 
image  and  succeeded  to  execute  all  commands  correctly. 

In  the  RoboCup@Home  competition,  the  robot  with  our  framework  is  given  three  following 
commands 

1 .  Say  the  time  to  Jacob  at  the  kitchen  table 

2.  Find  a  hair  spray  at  the  kitchen  table 

3.  Tell  me  how  many  cokes  are  on  the  desk 

Our  robot  successfully  executed  all  commands,  scoring  highest  score  among  seven  teams 
(Table  1).  The  video  of  the  GPSR  competition  can  be  found  in  goo.gl/fyRhtD 


c)  Robust  Human  Following  by  Deep  Bayesian  Trajectory  Prediction  for  Home  Service 
Robots 


Figure  3.  Robust  following  for  robots  to  interact  with  people  in  a  home  environment 
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Human  following  by  a  robot  has  been  an  ongoing  research  topic  in  the  robotics  community 
[23],  with  annual  robotic  competitions  [12]  [24]  to  test  the  following  performances.  To  achieve 
such  an  ability,  previous  studies  worked  with  vision  techniques  to  capture  the  human's 
characteristic  features  to  detect  and  track  the  human.  For  example,  SIFT  [25],  ORB  [26]  and 
template  matching  [27]  were  used  in  human  tracking.  However,  these  approaches  had  several 
limitations  with  in  illumination  change,  translation  of  objects  and  occlusion  of  the  sensors. 
Moreover,  the  difficulty  of  separating  a  person  between  the  foreground  and  the  background  was 
a  very  demanding  issue  to  maintain  a  following  system  with  a  certain  level  of  performance. 

In  contrast  with  the  mentioned  literature,  combining  the  high  performance  in  recognition 
using  deep  learning  methods,  empowered  by  the  computational  power  of  GPU,  and  generally 
adoptable  ROS  system,  we  introduce  a  robust  integrated  system  for  home  service  robots  to 
follow  a  person  in  the  home  environment  called  Deep  Bayesian  Trajectory  Prediction  (DBTP). 
DBTP  contributes  with  1)  robust  detection  and  identification  of  a  person  in  real-time  (around 
0.3  s)  in  a  homelike  environment  with  state-of-the-art  performance,  2)  following  the  target  with 
contextual  information  to  perform  better  collision  avoidance  and  3)  by  recording  a  person's 
coordinate  trajectory  in  real-time  matter,  we  could  empower  the  robot  with  an  ability  to  follow 
the  person  with  variational  Bayesian  linear  regression  (VBLR)  [28]  based  trajectory  prediction 
when  the  robot  failed  to  continuously  follow  or  lost  the  target  person  it  was  following. 

We  have  designed  four  experiments  to  demonstrate  the  proposed  framework's  success  in 
following  the  target  person,  avoiding  collision  and  continuously  following  when  the  target 
person  is  lost,  in  a  difficult  situation  in  the  environment  (Figure  4).  Lastly,  we  report  the  results 
of  our  performance  with  this  framework  in  the  RoboCup@Home2017  following  tasks. 


Figure  4.  Difficult  situation  for  robot  to  follow.  A,  B:  lost  target;  C:  wall  in  between 


1.  Following  performance  result: 
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b 


Figure  5.  a)  Following  the  whole  trajectory  of  the  target  person,  b)  Distance  between  robot  and 
target  person.  The  number  indicates  the  step  of  following  the  target  person 


Figure  5  indicates  the  robot's  trajectory.  The  blue  dot  (robot  position)  is  consistently 
following  the  person  even  when  the  person  changes  speed  and  direction.  Moreover,  at  the 
dotted  square  X,  Y,  Z,  the  target  person  behaves  with  dynamic  movements  like  wiggling  side 
by  side,  moving  in  a  narrow  space  and  even  moving  toward  the  robot  and  going  pass  the  robot. 
However,  our  system  robustly  follows  the  target  person  within  2.5m  distance. 

2.  Collision  avoidance  result: 


Figure  6.  Collision  avoidance.  Red  box:  close  trajectory  of  target.  Blue  box:  target  going  over  the 
obstacle.  Robot  robustly  following  with  reflex  control 

To  test  whether  our  system  could  perform  collision  avoidance  when  following,  we  placed 
obstacles  in  the  environment  as  depicted  in  Figure  6.  First,  for  the  red  box,  the  target  person 
passes  the  obstacle  very  closely  and  quickly  where  the  obstacles  overlapped  the  person's 
trajectory.  In  this  case,  the  control  system  executed  the  dynamics  control  with  the  reflex 
module  together  to  avoid  the  obstacle.  The  blue  box  obstacle  in  Figure  6  was  tested  to  see 
whether  our  action  controller  could  avoid  difficult  situations  of  colliding  with  the  obstacle. 
When  the  person  went  over  the  obstacle,  it  resulted  in  the  obstacle  being  placed  between  the 
robot  and  the  target  person.  For  such  a  case,  it  is  impossible  for  the  robot  to  follow  the  target 
with  only  the  dynamics  control.  However,  our  navigation  control  planed  the  path  periodically 
in  respect  to  the  person's  distance  and  applied  the  reflex  module  when  it  approached  close  to 
the  obstacle,  resulting  in  the  completion  of  following  the  person  to  the  end. 
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3.  Recovering  following  when  lost  target: 


Momentum  Maximum  Likelihood  Proposed  VBLR 


Figure  7.  Left:  Difficult  situation  for  the  robot  to  follow.  Right:  Predicted  trajectory  result  of  the 
target  using  momentum,  maximum  likelihood,  and  proposed  variational  Bayesian  linear 
regression  (VBLR).  The  blue  line  indicates  the  trajectory  history  of  the  target.  Red  X  indicates 
the  current  coordinate  of  the  target. 

We  examined  our  methods  with  two  difficult  situations  where  the  robot  could  easily  lose  the 
target  person  (Figure  7).  Task  A  is  a  situation  when  the  person  goes  out  the  door  and 
immediately  turns  right.  This  made  our  perception  module  capture  the  target  person  with  a 
slight  view  in  between  the  doorway  (Figure  7  solid  lined  box  top  row).  Task  B  is  when  robot 
totally  loses  the  perception  of  the  target  person,  when  the  target  person  hides  behind  the  wall 
by  turning  left  (Figure  7  dotted  lined  box  bottom  row).  We  compared  our  proposed  VBLR  with 
two  other  methods. 

First,  with  task  A,  every  method  found  the  target.  However,  the  gaps  between  each  method 
were  large  in  which  our  method  achieved  almost  real-time  re-following  at  that  given  situation. 
Moreover,  for  task  B,  the  other  two  methods  failed  on  detection  of  the  target  person.  For  the 
momentum  method,  the  robot  was  unable  to  move  out  of  the  doorway.  The  ML  method 
predicted  the  trajectory  to  go  outside  but  went  too  far  to  recognize  the  target  person.  As  a  result, 
even  for  this  task,  our  VBLR  succeeded  in  going  out  of  the  doorway  and  finding  the  target 
within  an  average  of  3  seconds.  The  average  consumption  time  with  100  trials. 

The  video  of  the  DBTP  can  be  found  in  https://youtu.be/F6211GhrbbE 

Conclusion: 

The  hypothesis  of  the  brain  being  evolved  to  support  its  mobility  has  been  raised.  In  fact,  as  the 
project  progressed,  we  could  discover  that  if  one  of  the  perception-action-learning  is  missing  or 
malfunctioning,  maintaining  the  full  ability  of  the  robot  was  almost  impossible  in  functioning  a  given 
scenarios.  However,  we  believe  that  even  though  perception  is  very  important,  if  it  is  unable  to 
perform  actions  in  the  environment,  the  perception  ability  almost  loses  its  purpose  for  mobile  robots 
in  a  home  environment.  Therefore,  as  in  the  basic  year  of  this  project,  we  achieved  the  basic  system 
for  mobile  robots  to  perceive,  act  and  learn  within  the  environment.  We  believe  that  using  this  system 
as  a  base,  higher  developing  higher  functions  like  memory  and  planning  could  be  attained,  which  by 
stepping  a  bit  forward  to  achieving  a  truly  human-level  AI. 
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-  J.  Choi,  B.-J.  Lee,  and  B.-T.  Zhang.  “Multi-focus  attention  network  for  efficient  deep 

reinforcement  learning.”  AAAI  2017  Workshop  on  What's  next  for  AI  in  games  (WNAIG 
2017),  2017 
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-  S.  Son,  J.  Kim,  B.-T.  Zhang.  “Active  Image  Learning  of  Household  Robots  Using  Bayesian 

Neural  Network.”  Korean  Institute  of  Information  Scientists  and  Engineer,  Winter 
Conference,  pp.  690-692,  2016.12. 

-  J.  Kim.  “Talking  to  Teach  a  Personal  Service  Robot  to  Get  Acquainted  with  the  Dynamically 

Changing  Home  Environment.”  In  2017  Korea  society  for  Cognitive  science,  Annual 
Conference.  2017.05.  (poster) 
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2017.05.  (poster) 

-  S.  Son.  “Optimizing  the  Continual  Learning  of  Bayesian  Neural  Network.”  In  2017  Korea 
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e)  Award 
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Attachments:  Publications  a),  b)  and  c)  listed  above. 
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